Discovering the mesoscale for chains of conflict

Abstract Conflicts, like many social processes, are related events that span multiple scales in time, from the instantaneous to multi-year development, and in space, from one neighborhood to continents. Yet, there is little systematic work on connecting the multiple scales, formal treatment of causality between events, and measures of uncertainty for how events are related to one another. We develop a method for extracting causally related chains of events that addresses these limitations with armed conflict. Our method explicitly accounts for an adjustable spatial and temporal scale of interaction for clustering individual events from a detailed data set, the Armed Conflict Event & Location Data Project. With it, we discover a mesoscale ranging from a week to a few months and tens to hundreds of kilometers, where long-range correlations and nontrivial dynamics relating conflict events emerge. Importantly, clusters in the mesoscale, while extracted from conflict statistics, are identifiable with mechanism cited in field studies. We leverage our technique to identify zones of causal interaction around conflict hotspots that naturally incorporate uncertainties. Thus, we show how a systematic, data-driven, and scalable procedure extracts social objects for study, providing a scope for scrutinizing and predicting conflict and other processes.

of areas of a government's direct control (for example, 69 proxy militias in Somalia which hold territory indepen-70 dently but are allied with the Federal Government)." 71 B. Algorithm for generating conflict avalanches 72 We devise a systematic method to generate conflict avalanches 73 for different levels of resolution, which are indexed by combi-74 nations of spatial and temporal scales. 75 • Choice of scale 76 To set the spatial scale, we divide Africa into Voronoi cells  is asymmetric, we must calculate it in both directions 100 for every pair. If the magnitude of transfer entropy is 101 significant in both directions, we get a bi-directional causal 102 spatiotemporal bin. Each spatiotemporal bin can have one of two values, one or zero. One represents presence of conflict and zero represents absence. Every spatiotemporal bin which has value one is called a "packet" of conflict event(s). Each column here is a binary time series of an individual spatial cell. For example, the spatiotemporal bins in the red box forms the time series for spatial bin number 1.

D R A F T
link as we show in Figure 4. If the magnitude is significant 103 only in one direction, we get a uni-directional causal link. 104 We do this for all adjacent pairs of cells to construct a 105 causal network. 106 We search for a self-causal loop (an edge from a spatial 107 cell at time t to itself at time t + 1) using the transfer 108 entropy, which reduces to the mutual information in Eq 1.

109
• Clustering events 110 We cluster together every pair of conflict events that 111 satisfies one of the three following conditions:    Once every pair has been clustered (some will remain 120 alone), we have conflict avalanches.

122
In principle, one could have constructed the "causal" net-123 work using other measures of temporal predictability such as 124 Granger causality, time-delayed correlation, and time-delayed 125 mutual information. We do not consider Granger causality 126 because the variables we consider are not Gaussian, the as-127 sumption underlying that measure. While the latter measures 128 do not explicitly distinguish the directionality of time because 129 they are time-symmetric, we can still measure asymmetric 130 information between sites by testing one site to be the past of 131 the other and vice versa. 132 We show in Figure S2 the resulting networks from the time-133 delayed pairwise correlation in panel a and with time-delayed 134 mutual information in panel b. In the same way as with the 135 transfer entropy calculation, we flag an edge as significant if 136 it is of higher value than 95% of bootstrapped time-shuffles. 137 Unsurprisingly, the alternative measures return dense concen-138 trations of links in similar areas as with the transfer entropy. In 139 contrast, networks are denser both within the conflict hotspots 140 and in more remote regions, suggesting that the transfer en-141 tropy provides a more discriminatory approach on which to 142 build conflict avalanches.     Table S1. Definitions of distributions used to fit conflict properties (7).

Continuous
Power law Power law Looking over all the data points, we obtain the likelihood of 186 the set of data where k is an index over the K data points, and the probability 188 according to the model is p(x k ). Since the number of data 189 points that we fit depend on the model -for example, the 190 power law comes with a lower cutoff -we compute the typical 191 log-likelihood for each data point that was fit by normalizing 192 by the total number of data points considered K such that we 193 have log L/K.

194
In our procedure, we would like to verify if the tail of the 195 distribution resembles a power law beyond what we might 196 reasonably expect from a comparison with the null models. 197 This goal indicates that we should compare the fit to the tail 198 of the distribution conditioned to be above the power law's 199 lower cutoff. As we show in Figure S6, we find that the power 200 law is superior in the tail to lognormal distribution in much 201 of the mesoscale. In each box, we indicate with the color 202 the typical gain in log-likelihood we have with the power law 203 model taken over the random Voronoi tessellations and with 204 the number the fraction of times the power law is superior; we 205 have excluded cases where the number of data points K < 20 206 and the range of the data fit to the power law ranges over 207 less than a decade since the statistics become unreliable. The 208 lognormal tends to be superior on for short time scales a (left 209 side of graphs), but when it is overall better in the mesoscale 210 the difference is slight. In Figure S7, we show the exponential 211 is always worse. In this sense, we rely on the power law as a 212 useful and approximate scaling hypothesis on which to relate 213 the conflict properties to one another above some minimal 214 scale.

F. Population density 216
For gaining intuition about how the separation scales b and a 217 relate to other social and geographic factors, we look at a map 218 of population centers from the data set Africapolis. Africapolis 219 considers population centers to be an "urban agglomeration" 220 if the population exceed 10 4 and there is no gap greater than 221 200 meters between built spaces (8). Population counts are 222 extracted using census data and the built space is determined 223 from satellite imagery. This provides a systematic and univer-224 sal definition of a city, called an "urban agglomeration," which 225 does not depend on the vagaries of country records, datasets, 226 and administrative definitions. Fig. S6. Comparison between power law and lognormal distribution for (a) fatalities, (b) reports, (c) sites, (d) duration. Each block is centered at a spatiotemporal scale (see Appendix I for more details) at which we compare the log-likelihood of the two models. Green line encircles the mesoscale. Colors show the difference between log-likelihood of power law and lognormal distribution averaged over 100 pseudorandom Voronoi tessellations, i.e. red means that the power law is on average better. Number inside each spatiotemporal block shows the fraction of pseudorandom Voronoi tessellations for which power law is better. Not shown are scales with either less than 20 avalanches or range of less than a decade above the power law minimum. Figure S6 for (a) fatalities, (b) reports, (c) sites, (d) duration.   are equal to one. The actor overlap score Ω is calculated by 256 taking the mean over all the elements of the matrix M (see 257 Figure S9).

H. Statistical causality and transfer entropy 259
In 1956, Wiener formulated causality in terms of predictability: 260 "For two simultaneously measured signals, if we can predict 261 the first signal better by using the past information from the 262 second one than by using the information without it, then 263 we call the second signal causal to the first one" (9). Later, 264 Granger formulated it mathematically by introducing a statis-265 tical concept of causality based on evaluation of predictability 266 which is now commonly known as "Granger causality" (10). 267 Transfer entropy is an information theoretic measure that gen-268 eralizes some of the assumptions for Granger causality (GC). 269 Namely, transfer entropy neither requires an explicit model 270 (GC assumes a linear relationship between the predicted and 271 predicting variables) nor assumes normality in their distribu-272 tions.

273
In short, transfer entropy is an information-theoretic quan-274 of this analysis is shown in Figure S11. Each block corresponds to a spatiotemporal scale (see Appendix I for more details). Validity of exponent relation is checked for 100 pseudorandom Voronoi tessellation realization. If more than 95% of these random realizations satisfy the exponent relation, we conclude that the exponent relation is significant (blue); otherwise, it is not (red).